Locality Optimizations for Parallel Computing Using Data Access Information
نویسنده
چکیده
Given the large communication overheads characteristic of modern parallel machines, optimizations that improve locality by executing tasks close to data that they will access may improve the performance of parallel computations. This paper describes our experience automatically applying locality optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-level concurrency. Jade programmers start with a program written in a standard serial, imperative language, then use Jade constructs to declare how parts of the program access data. The Jade implementation uses this data access information to automatically extract the concurrency and apply locality optimizations. We present performance results for several Jade applications running on the Stanford DASH machine. We use these results to characterize the overall performance impact of the locality optimizations. In our application set the locality optimization level has little e ect on the performance of two of the applications and a large e ect on the performance of the rest of the applications. We also found that, if the locality optimization level had a signi cant e ect on the performance, the maximum performance was obtained when the programmer explicitly placed tasks on processors rather than relying on the scheduling algorithm inside the Jade implementation.
منابع مشابه
Optimizations for Parallel Computing Using DataAccess
Given the large communication overheads characteristic of modern parallel machines, optimizations that eliminate, hide or parallelize communication may improve the performance of parallel computations. This paper describes our experience automatically applying communication optimizations in the context of Jade, a portable, implicitly parallel programming language designed for exploiting task-le...
متن کاملCacheminer: A Runtime Approach to Exploit Cache Locality on SMP
ÐExploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted ar...
متن کاملHierarchical Domain Partitioning For Hierarchical Architectures
The history of parallel computing shows that good performance is heavily dependent on data locality. Prior knowledge of data access patterns allows for optimizations that reduce data movement, achieving lower data access latencies. Compilers and runtime systems, however, have difficulties in speculating on locality issues among threads. Future multicore architectures are likely to present a hie...
متن کاملOn the Cache Access Behavior of OpenMP Applications
The widening gap between memory and processor speed results in increasing requirements to improve the cache utility. This issue is especially critical for OpenMP execution which usually explores fine-grained parallelism. The work presented in this paper studies the cache behavior of OpenMP applications in order to detect potential optimizations with respect to cache locality. This study is base...
متن کاملParallel Data Mining for Association Rules onShared - memory
In this paper we present a new parallel algorithm for data mining of association rules on shared-memory multiprocessors. We study the degree of parallelism, synchronization, and data locality issues, and present optimizations for fast frequency computation. Experiments show that a significant improvement of performance is achieved using our proposed optimizations. We also achieved good speed-up...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- International Journal of High Speed Computing
دوره 9 شماره
صفحات -
تاریخ انتشار 1997